Overview

Dataset statistics

Number of variables22
Number of observations41188
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.9 MiB
Average record size in memory176.0 B

Variable types

CAT11
NUM11

Warnings

euribor3m is highly correlated with emp.var.rate and 1 other fieldsHigh correlation
emp.var.rate is highly correlated with euribor3m and 1 other fieldsHigh correlation
nr.employed is highly correlated with emp.var.rate and 1 other fieldsHigh correlation
Unnamed: 0 has unique values Unique
previous has 35563 (86.3%) zeros Zeros

Reproduction

Analysis started2020-11-14 18:01:40.212978
Analysis finished2020-11-14 18:02:19.663952
Duration39.45 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

Unnamed: 0
Real number (ℝ≥0)

UNIQUE

Distinct41188
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20593.5
Minimum0
Maximum41187
Zeros1
Zeros (%)< 0.1%
Memory size321.8 KiB

Quantile statistics

Minimum0
5-th percentile2059.35
Q110296.75
median20593.5
Q330890.25
95-th percentile39127.65
Maximum41187
Range41187
Interquartile range (IQR)20593.5

Descriptive statistics

Standard deviation11890.09578
Coefficient of variation (CV)0.5773712958
Kurtosis-1.2
Mean20593.5
Median Absolute Deviation (MAD)10297
Skewness0
Sum848205078
Variance141374377.7
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20471< 0.1%
 
340421< 0.1%
 
382321< 0.1%
 
115991< 0.1%
 
95501< 0.1%
 
156931< 0.1%
 
136441< 0.1%
 
34031< 0.1%
 
13541< 0.1%
 
74971< 0.1%
 
Other values (41178)41178> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
411871< 0.1%
 
411861< 0.1%
 
411851< 0.1%
 
411841< 0.1%
 
411831< 0.1%
 

age
Real number (ℝ≥0)

Distinct78
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.02406041
Minimum17
Maximum98
Zeros0
Zeros (%)0.0%
Memory size321.8 KiB

Quantile statistics

Minimum17
5-th percentile26
Q132
median38
Q347
95-th percentile58
Maximum98
Range81
Interquartile range (IQR)15

Descriptive statistics

Standard deviation10.42124998
Coefficient of variation (CV)0.2603746315
Kurtosis0.7913115312
Mean40.02406041
Median Absolute Deviation (MAD)7
Skewness0.7846968158
Sum1648511
Variance108.6024512
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
3119474.7%
 
3218464.5%
 
3318334.5%
 
3617804.3%
 
3517594.3%
 
3417454.2%
 
3017144.2%
 
3714753.6%
 
2914533.5%
 
3914323.5%
 
Other values (68)2420458.8%
 
ValueCountFrequency (%) 
175< 0.1%
 
18280.1%
 
19420.1%
 
20650.2%
 
211020.2%
 
ValueCountFrequency (%) 
982< 0.1%
 
951< 0.1%
 
941< 0.1%
 
924< 0.1%
 
912< 0.1%
 

job
Categorical

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size321.8 KiB
"admin."
10422 
"blue-collar"
9254 
"technician"
6743 
"services"
3969 
"management"
2924 
Other values (7)
7876 
ValueCountFrequency (%) 
"admin."1042225.3%
 
"blue-collar"925422.5%
 
"technician"674316.4%
 
"services"39699.6%
 
"management"29247.1%
 
"retired"17204.2%
 
"entrepreneur"14563.5%
 
"self-employed"14213.5%
 
"housemaid"10602.6%
 
"unemployed"10142.5%
 
Other values (2)12052.9%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length15
Median length12
Mean length10.95522968
Min length8

marital
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size321.8 KiB
"married"
24928 
"single"
11568 
"divorced"
4612 
"unknown"
 
80
ValueCountFrequency (%) 
"married"2492860.5%
 
"single"1156828.1%
 
"divorced"461211.2%
 
"unknown"800.2%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length10
Median length9
Mean length8.831115859
Min length8

education
Categorical

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size321.8 KiB
"university.degree"
12168 
"high.school"
9515 
"basic.9y"
6045 
"professional.course"
5243 
"basic.4y"
4176 
Other values (3)
4041 
ValueCountFrequency (%) 
"university.degree"1216829.5%
 
"high.school"951523.1%
 
"basic.9y"604514.7%
 
"professional.course"524312.7%
 
"basic.4y"417610.1%
 
"basic.6y"22925.6%
 
"unknown"17314.2%
 
"illiterate"18< 0.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length21
Median length13
Mean length14.7109595
Min length9

default
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size321.8 KiB
"no"
32588 
"unknown"
8597 
"yes"
 
3
ValueCountFrequency (%) 
"no"3258879.1%
 
"unknown"859720.9%
 
"yes"3< 0.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length4
Mean length5.043702049
Min length4

housing
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size321.8 KiB
"yes"
21576 
"no"
18622 
"unknown"
 
990
ValueCountFrequency (%) 
"yes"2157652.4%
 
"no"1862245.2%
 
"unknown"9902.4%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length5
Mean length4.644022531
Min length4

loan
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size321.8 KiB
"no"
33950 
"yes"
6248 
"unknown"
 
990
ValueCountFrequency (%) 
"no"3395082.4%
 
"yes"624815.2%
 
"unknown"9902.4%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length4
Mean length4.271875303
Min length4

contact
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size321.8 KiB
"cellular"
26144 
"telephone"
15044 
ValueCountFrequency (%) 
"cellular"2614463.5%
 
"telephone"1504436.5%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length11
Median length10
Mean length10.36525202
Min length10

month
Categorical

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size321.8 KiB
"may"
13769 
"jul"
7174 
"aug"
6178 
"jun"
5318 
"nov"
4101 
Other values (5)
4648 
ValueCountFrequency (%) 
"may"1376933.4%
 
"jul"717417.4%
 
"aug"617815.0%
 
"jun"531812.9%
 
"nov"410110.0%
 
"apr"26326.4%
 
"oct"7181.7%
 
"sep"5701.4%
 
"mar"5461.3%
 
"dec"1820.4%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length5
Median length5
Mean length5
Min length5

day_of_week
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size321.8 KiB
"thu"
8623 
"mon"
8514 
"wed"
8134 
"tue"
8090 
"fri"
7827 
ValueCountFrequency (%) 
"thu"862320.9%
 
"mon"851420.7%
 
"wed"813419.7%
 
"tue"809019.6%
 
"fri"782719.0%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length5
Median length5
Mean length5
Min length5

duration
Real number (ℝ≥0)

Distinct1544
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean258.2850102
Minimum0
Maximum4918
Zeros4
Zeros (%)< 0.1%
Memory size321.8 KiB

Quantile statistics

Minimum0
5-th percentile36
Q1102
median180
Q3319
95-th percentile752.65
Maximum4918
Range4918
Interquartile range (IQR)217

Descriptive statistics

Standard deviation259.2792488
Coefficient of variation (CV)1.003849386
Kurtosis20.24793801
Mean258.2850102
Median Absolute Deviation (MAD)94
Skewness3.263141255
Sum10638243
Variance67225.72888
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
851700.4%
 
901700.4%
 
1361680.4%
 
731670.4%
 
1241640.4%
 
871620.4%
 
721610.4%
 
1041610.4%
 
1111600.4%
 
1061590.4%
 
Other values (1534)3954696.0%
 
ValueCountFrequency (%) 
04< 0.1%
 
13< 0.1%
 
21< 0.1%
 
33< 0.1%
 
412< 0.1%
 
ValueCountFrequency (%) 
49181< 0.1%
 
41991< 0.1%
 
37851< 0.1%
 
36431< 0.1%
 
36311< 0.1%
 

campaign
Real number (ℝ≥0)

Distinct42
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.567592503
Minimum1
Maximum56
Zeros0
Zeros (%)0.0%
Memory size321.8 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile7
Maximum56
Range55
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.770013543
Coefficient of variation (CV)1.078836903
Kurtosis36.97979514
Mean2.567592503
Median Absolute Deviation (MAD)1
Skewness4.762506697
Sum105754
Variance7.672975028
MonotocityNot monotonic
Histogram with fixed size bins (bins=42)
ValueCountFrequency (%) 
11764242.8%
 
21057025.7%
 
3534113.0%
 
426516.4%
 
515993.9%
 
69792.4%
 
76291.5%
 
84001.0%
 
92830.7%
 
102250.5%
 
Other values (32)8692.1%
 
ValueCountFrequency (%) 
11764242.8%
 
21057025.7%
 
3534113.0%
 
426516.4%
 
515993.9%
 
ValueCountFrequency (%) 
561< 0.1%
 
432< 0.1%
 
422< 0.1%
 
411< 0.1%
 
402< 0.1%
 

pdays
Real number (ℝ≥0)

Distinct27
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean962.475454
Minimum0
Maximum999
Zeros15
Zeros (%)< 0.1%
Memory size321.8 KiB

Quantile statistics

Minimum0
5-th percentile999
Q1999
median999
Q3999
95-th percentile999
Maximum999
Range999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation186.9109073
Coefficient of variation (CV)0.194198103
Kurtosis22.22946263
Mean962.475454
Median Absolute Deviation (MAD)0
Skewness-4.922189916
Sum39642439
Variance34935.68728
MonotocityNot monotonic
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%) 
9993967396.3%
 
34391.1%
 
64121.0%
 
41180.3%
 
9640.2%
 
2610.1%
 
7600.1%
 
12580.1%
 
10520.1%
 
5460.1%
 
Other values (17)2050.5%
 
ValueCountFrequency (%) 
015< 0.1%
 
1260.1%
 
2610.1%
 
34391.1%
 
41180.3%
 
ValueCountFrequency (%) 
9993967396.3%
 
271< 0.1%
 
261< 0.1%
 
251< 0.1%
 
223< 0.1%
 

previous
Real number (ℝ≥0)

ZEROS

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1729629989
Minimum0
Maximum7
Zeros35563
Zeros (%)86.3%
Memory size321.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum7
Range7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.4949010798
Coefficient of variation (CV)2.861311858
Kurtosis20.10881622
Mean0.1729629989
Median Absolute Deviation (MAD)0
Skewness3.832042243
Sum7124
Variance0.2449270788
MonotocityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%) 
03556386.3%
 
1456111.1%
 
27541.8%
 
32160.5%
 
4700.2%
 
518< 0.1%
 
65< 0.1%
 
71< 0.1%
 
ValueCountFrequency (%) 
03556386.3%
 
1456111.1%
 
27541.8%
 
32160.5%
 
4700.2%
 
ValueCountFrequency (%) 
71< 0.1%
 
65< 0.1%
 
518< 0.1%
 
4700.2%
 
32160.5%
 

poutcome
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size321.8 KiB
"nonexistent"
35563 
"failure"
4252 
"success"
 
1373
ValueCountFrequency (%) 
"nonexistent"3556386.3%
 
"failure"425210.3%
 
"success"13733.3%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length13
Median length13
Mean length12.45372439
Min length9

emp.var.rate
Real number (ℝ)

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.08188550063
Minimum-3.4
Maximum1.4
Zeros0
Zeros (%)0.0%
Memory size321.8 KiB

Quantile statistics

Minimum-3.4
5-th percentile-2.9
Q1-1.8
median1.1
Q31.4
95-th percentile1.4
Maximum1.4
Range4.8
Interquartile range (IQR)3.2

Descriptive statistics

Standard deviation1.570959741
Coefficient of variation (CV)19.18483405
Kurtosis-1.062631525
Mean0.08188550063
Median Absolute Deviation (MAD)0.3
Skewness-0.7240955492
Sum3372.7
Variance2.467914506
MonotocityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.41623439.4%
 
-1.8918422.3%
 
1.1776318.8%
 
-0.136838.9%
 
-2.916634.0%
 
-3.410712.6%
 
-1.77731.9%
 
-1.16351.5%
 
-31720.4%
 
-0.210< 0.1%
 
ValueCountFrequency (%) 
-3.410712.6%
 
-31720.4%
 
-2.916634.0%
 
-1.8918422.3%
 
-1.77731.9%
 
ValueCountFrequency (%) 
1.41623439.4%
 
1.1776318.8%
 
-0.136838.9%
 
-0.210< 0.1%
 
-1.16351.5%
 

cons.price.idx
Real number (ℝ≥0)

Distinct26
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean93.57566437
Minimum92.201
Maximum94.767
Zeros0
Zeros (%)0.0%
Memory size321.8 KiB

Quantile statistics

Minimum92.201
5-th percentile92.713
Q193.075
median93.749
Q393.994
95-th percentile94.465
Maximum94.767
Range2.566
Interquartile range (IQR)0.919

Descriptive statistics

Standard deviation0.578840049
Coefficient of variation (CV)0.00618579684
Kurtosis-0.8298085772
Mean93.57566437
Median Absolute Deviation (MAD)0.38
Skewness-0.2308876514
Sum3854194.464
Variance0.3350558023
MonotocityNot monotonic
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%) 
93.994776318.8%
 
93.918668516.2%
 
92.893579414.1%
 
93.444517512.6%
 
94.465437410.6%
 
93.236168.8%
 
93.07524586.0%
 
92.2017701.9%
 
92.9637151.7%
 
92.4314471.1%
 
Other values (16)33918.2%
 
ValueCountFrequency (%) 
92.2017701.9%
 
92.3792670.6%
 
92.4314471.1%
 
92.4691780.4%
 
92.6493570.9%
 
ValueCountFrequency (%) 
94.7671280.3%
 
94.6012040.5%
 
94.465437410.6%
 
94.2153110.8%
 
94.1993030.7%
 

cons.conf.idx
Real number (ℝ)

Distinct26
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-40.50260027
Minimum-50.8
Maximum-26.9
Zeros0
Zeros (%)0.0%
Memory size321.8 KiB

Quantile statistics

Minimum-50.8
5-th percentile-47.1
Q1-42.7
median-41.8
Q3-36.4
95-th percentile-33.6
Maximum-26.9
Range23.9
Interquartile range (IQR)6.3

Descriptive statistics

Standard deviation4.628197856
Coefficient of variation (CV)-0.1142691537
Kurtosis-0.3585583105
Mean-40.50260027
Median Absolute Deviation (MAD)4.4
Skewness0.3031798587
Sum-1668221.1
Variance21.4202154
MonotocityNot monotonic
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%) 
-36.4776318.8%
 
-42.7668516.2%
 
-46.2579414.1%
 
-36.1517512.6%
 
-41.8437410.6%
 
-4236168.8%
 
-47.124586.0%
 
-31.47701.9%
 
-40.87151.7%
 
-26.94471.1%
 
Other values (16)33918.2%
 
ValueCountFrequency (%) 
-50.81280.3%
 
-502820.7%
 
-49.52040.5%
 
-47.124586.0%
 
-46.2579414.1%
 
ValueCountFrequency (%) 
-26.94471.1%
 
-29.82670.6%
 
-30.13570.9%
 
-31.47701.9%
 
-331720.4%
 

euribor3m
Real number (ℝ≥0)

HIGH CORRELATION

Distinct316
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.621290813
Minimum0.634
Maximum5.045
Zeros0
Zeros (%)0.0%
Memory size321.8 KiB

Quantile statistics

Minimum0.634
5-th percentile0.797
Q11.344
median4.857
Q34.961
95-th percentile4.966
Maximum5.045
Range4.411
Interquartile range (IQR)3.617

Descriptive statistics

Standard deviation1.734447405
Coefficient of variation (CV)0.4789583313
Kurtosis-1.406802622
Mean3.621290813
Median Absolute Deviation (MAD)0.108
Skewness-0.7091879564
Sum149153.726
Variance3.0083078
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
4.85728687.0%
 
4.96226136.3%
 
4.96324876.0%
 
4.96119024.6%
 
4.85612102.9%
 
4.96411752.9%
 
1.40511692.8%
 
4.96510712.6%
 
4.86410442.5%
 
4.9610132.5%
 
Other values (306)2463659.8%
 
ValueCountFrequency (%) 
0.6348< 0.1%
 
0.635430.1%
 
0.63614< 0.1%
 
0.6376< 0.1%
 
0.6387< 0.1%
 
ValueCountFrequency (%) 
5.0459< 0.1%
 
57< 0.1%
 
4.971720.4%
 
4.9689922.4%
 
4.9676431.6%
 

nr.employed
Real number (ℝ≥0)

HIGH CORRELATION

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5167.035911
Minimum4963.6
Maximum5228.1
Zeros0
Zeros (%)0.0%
Memory size321.8 KiB

Quantile statistics

Minimum4963.6
5-th percentile5017.5
Q15099.1
median5191
Q35228.1
95-th percentile5228.1
Maximum5228.1
Range264.5
Interquartile range (IQR)129

Descriptive statistics

Standard deviation72.25152767
Coefficient of variation (CV)0.01398316732
Kurtosis-0.003760375696
Mean5167.035911
Median Absolute Deviation (MAD)37.1
Skewness-1.044262407
Sum212819875.1
Variance5220.28325
MonotocityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%) 
5228.11623439.4%
 
5099.1853420.7%
 
5191776318.8%
 
5195.836838.9%
 
5076.216634.0%
 
5017.510712.6%
 
4991.67731.9%
 
5008.76501.6%
 
4963.66351.5%
 
5023.51720.4%
 
ValueCountFrequency (%) 
4963.66351.5%
 
4991.67731.9%
 
5008.76501.6%
 
5017.510712.6%
 
5023.51720.4%
 
ValueCountFrequency (%) 
5228.11623439.4%
 
5195.836838.9%
 
5191776318.8%
 
5176.310< 0.1%
 
5099.1853420.7%
 

y
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size321.8 KiB
"no"
36548 
"yes"
4640 
ValueCountFrequency (%) 
"no"3654888.7%
 
"yes"464011.3%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length5
Median length4
Mean length4.112654171
Min length4

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

Unnamed: 0agejobmaritaleducationdefaulthousingloancontactmonthday_of_weekdurationcampaignpdayspreviouspoutcomeemp.var.ratecons.price.idxcons.conf.idxeuribor3mnr.employedy
0056"housemaid""married""basic.4y""no""no""no""telephone""may""mon"26119990"nonexistent"1.193.994-36.44.8575191.0"no"
1157"services""married""high.school""unknown""no""no""telephone""may""mon"14919990"nonexistent"1.193.994-36.44.8575191.0"no"
2237"services""married""high.school""no""yes""no""telephone""may""mon"22619990"nonexistent"1.193.994-36.44.8575191.0"no"
3340"admin.""married""basic.6y""no""no""no""telephone""may""mon"15119990"nonexistent"1.193.994-36.44.8575191.0"no"
4456"services""married""high.school""no""no""yes""telephone""may""mon"30719990"nonexistent"1.193.994-36.44.8575191.0"no"
5545"services""married""basic.9y""unknown""no""no""telephone""may""mon"19819990"nonexistent"1.193.994-36.44.8575191.0"no"
6659"admin.""married""professional.course""no""no""no""telephone""may""mon"13919990"nonexistent"1.193.994-36.44.8575191.0"no"
7741"blue-collar""married""unknown""unknown""no""no""telephone""may""mon"21719990"nonexistent"1.193.994-36.44.8575191.0"no"
8824"technician""single""professional.course""no""yes""no""telephone""may""mon"38019990"nonexistent"1.193.994-36.44.8575191.0"no"
9925"services""single""high.school""no""yes""no""telephone""may""mon"5019990"nonexistent"1.193.994-36.44.8575191.0"no"

Last rows

Unnamed: 0agejobmaritaleducationdefaulthousingloancontactmonthday_of_weekdurationcampaignpdayspreviouspoutcomeemp.var.ratecons.price.idxcons.conf.idxeuribor3mnr.employedy
411784117862"retired""married""university.degree""no""no""no""cellular""nov""thu"483263"success"-1.194.767-50.81.0314963.6"yes"
411794117964"retired""divorced""professional.course""no""yes""no""cellular""nov""fri"15139990"nonexistent"-1.194.767-50.81.0284963.6"no"
411804118036"admin.""married""university.degree""no""no""no""cellular""nov""fri"25429990"nonexistent"-1.194.767-50.81.0284963.6"no"
411814118137"admin.""married""university.degree""no""yes""no""cellular""nov""fri"28119990"nonexistent"-1.194.767-50.81.0284963.6"yes"
411824118229"unemployed""single""basic.4y""no""yes""no""cellular""nov""fri"112191"success"-1.194.767-50.81.0284963.6"no"
411834118373"retired""married""professional.course""no""yes""no""cellular""nov""fri"33419990"nonexistent"-1.194.767-50.81.0284963.6"yes"
411844118446"blue-collar""married""professional.course""no""no""no""cellular""nov""fri"38319990"nonexistent"-1.194.767-50.81.0284963.6"no"
411854118556"retired""married""university.degree""no""yes""no""cellular""nov""fri"18929990"nonexistent"-1.194.767-50.81.0284963.6"no"
411864118644"technician""married""professional.course""no""no""no""cellular""nov""fri"44219990"nonexistent"-1.194.767-50.81.0284963.6"yes"
411874118774"retired""married""professional.course""no""yes""no""cellular""nov""fri"23939991"failure"-1.194.767-50.81.0284963.6"no"